feat: add fastCRW tool block by us · Pull Request #5025 · simstudioai/sim

us · 2026-06-13T17:21:36Z

What

Adds fastCRW as a tool block (scrape / crawl / map / search), mirroring the existing Firecrawl block.

Why

fastCRW is a genuinely more open, faster, and higher-quality web engine than Firecrawl — and it runs completely locally.

Full capability in open core, runs 100% locally: Anti-bot/stealth bypass, BYO-proxy with rotation, and JS rendering all ship in the open core (AGPL). Firecrawl's OSS build gates its stealth engine (fire-engine) behind a cloud-only flag — so a self-hosted Firecrawl cannot reach Cloudflare-protected or JS-heavy sites. fastCRW's self-host can. One binary, no cloud dependency, no asterisks.
Faster + higher-quality on Firecrawl's own benchmark dataset: truth-recall 63.74% vs 56.04%, with lower median latency (p50 ~1.9 s vs ~2.3 s). Ships as a single ~8 MB Rust binary using ~6 MB RAM.
Search built on SearXNG, not just backed by it: crw is not an alternative to SearXNG — it is built on top of it. SearXNG is the metasearch aggregator underneath; crw adds a quality layer: query expansion (multi-variant rewrite), content-aware reranking (re-scoring by fetched content instead of SearXNG's content-blind ordering), and category routing (research queries fan out to arxiv/semantic scholar, code queries to GitHub). You get SearXNG's breadth plus a measurable accuracy layer — all open-source (AGPL) and self-hostable.

Firecrawl-API compatibility is why the integration is a tiny additive diff that slots in alongside the existing Firecrawl block with no regressions.

Changes (additive only)

apps/sim/tools/crw/: scrape/crawl/map/search + types (mirrors tools/firecrawl/).
apps/sim/blocks/blocks/crw.ts + registered in blocks/registry.ts, tools/registry.ts.
Icon, CSP allowlist entry, BYOK key entry, integrations.json — every place Firecrawl is registered.

Config

CRW_API_KEY from https://fastcrw.com/dashboard (free tier); base URL overridable for self-host.

Happy to adjust — I maintain the integration and can provide free credits.

vercel · 2026-06-13T17:21:43Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
docs	Skipped		Jun 19, 2026 9:40pm

cursor · 2026-06-13T17:21:46Z

PR Summary

Low Risk
Purely additive integration with no changes to existing providers; main runtime surface is outbound API calls and crawl polling, matching established Firecrawl patterns.

Overview
Adds fastCRW as a new web data integration alongside Firecrawl: a workflow block with scrape, search, crawl, and map operations, wired to four new tools that call Firecrawl-compatible /v1/* endpoints on https://fastcrw.com/api (or a user-supplied Base URL for self-host).

Registration is additive everywhere Firecrawl already appears: block and tool registries, integrations.json, BYOK (crw under Search & web), BYOKProviderId / API contracts, icon mapping, and CSP connect-src for https://fastcrw.com. Tools are BYOK-only (CRW_API_KEY, zero Sim metering); crawl creates an async job and polls until completion or the execution timeout. Block meta adds templates and agent skills; crw.test.ts covers URL resolution, request shaping, and response mapping.

^{Reviewed by Cursor Bugbot for commit bb6839d. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor · 2026-06-13T17:24:08Z

+          formats: params.formats || ['markdown'],
+          onlyMainContent: params.onlyMainContent || false,
+        },
+      }


Crawl sends maxPages not limit

Medium Severity

The crawl request body sends maxPages, while fastCRW’s Firecrawl-compatible POST /v1/crawl expects limit for the page cap. The block’s Max Pages value is ignored and the service falls back to its default crawl size.

^{Reviewed by Cursor Bugbot for commit 2964aed. Configure here.}

greptile-apps · 2026-06-13T17:28:16Z

Greptile Summary

This PR adds fastCRW as a new tool block (scrape / crawl / map / search), mirroring the existing Firecrawl block. The integration is additive-only: new files under tools/crw/ and blocks/blocks/crw.ts, plus registration in the block/tool registries, BYOK keys, CSP allowlist, icon, and integrations.json.

Four tool configs (crw_scrape, crw_search, crw_crawl, crw_map) mirror Firecrawl's structure with fastCRW-specific differences: maxPages instead of limit for crawl, a dynamic baseUrl param for self-hosting, and a resolveCrwBaseUrl helper.
Registration is complete across all required locations (BYOK schema, type union, CSP, icon mapping, integrations JSON), and a test file covers URL construction, body building, and response transformation for all four operations.

Confidence Score: 4/5

The change is purely additive and isolated to new files; no existing functionality is modified. The three tools with hardcoded success responses will silently swallow API-level errors, but they won't cause data corruption or affect other blocks.

Three of the four new tools (scrape, search, crawl) always return success: true from transformResponse even when the API body indicates failure — the crawl case is the worst because an undefined jobId leads the poll loop to request /v1/crawl/undefined, masking the real error. The fourth tool (map) handles this correctly, making the inconsistency self-contained within this PR. No other part of the codebase is touched.

apps/sim/tools/crw/scrape.ts, apps/sim/tools/crw/search.ts, apps/sim/tools/crw/crawl.ts — the transformResponse functions in all three need to check data.success before reporting a successful result.

Important Files Changed

Filename	Overview
apps/sim/blocks/blocks/crw.ts	New block config mirroring Firecrawl; routes scrape/search/crawl/map to the correct crw_* tools, formats params, and exposes baseUrl for self-hosting. Clean and consistent with existing block patterns.
apps/sim/tools/crw/scrape.ts	Scrape tool is structurally correct but hardcodes `success: true` in transformResponse regardless of API-level errors, unlike map.ts which properly checks data.success.
apps/sim/tools/crw/search.ts	Search tool also hardcodes `success: true` in transformResponse; same inconsistency with map.ts. Additionally, `limit` and `sources` params are used in the body builder but not declared in the tool's params definition (though this mirrors the Firecrawl search pattern).
apps/sim/tools/crw/crawl.ts	Crawl tool implements async polling correctly, but transformResponse ignores data.success — if job creation returns HTTP 200 with success:false, postProcess will poll /v1/crawl/undefined leading to a confusing 404 error instead of the real failure.
apps/sim/tools/crw/map.ts	Map tool correctly checks data.success in transformResponse and handles missing links with a fallback array. Well-structured and complete.
apps/sim/tools/crw/types.ts	Comprehensive type definitions and output property constants. Clean mirror of the Firecrawl types, with appropriate additions for fastCRW-specific fields.
apps/sim/tools/crw/crw.test.ts	Good coverage of URL construction, body building, and response transformation for all four operations. Tests document the expected API response shapes clearly.
apps/sim/lib/core/security/csp.ts	Adds https://fastcrw.com to connect-src allowlist. Covers the full domain/origin, which is sufficient since the API lives at /api/v1/* on the same origin.
apps/sim/tools/crw/base-url.ts	Clean utility for resolving the base URL, with trailing-slash stripping and a sensible default. Well-tested.
apps/sim/lib/api/contracts/byok-keys.ts	Correctly adds 'crw' to the BYOK provider ID zod schema enum.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[CrwBlock - crw.ts] -->|operation=scrape| B[crw_scrape tool]
    A -->|operation=search| C[crw_search tool]
    A -->|operation=crawl| D[crw_crawl tool]
    A -->|operation=map| E[crw_map tool]

    B --> F["POST /v1/scrape\n(fastcrw.com/api)"]
    C --> G["POST /v1/search\n(fastcrw.com/api)"]
    D --> H["POST /v1/crawl\n(fastcrw.com/api)"]
    E --> I["POST /v1/map\n(fastcrw.com/api)"]

    D -->|async job| J[postProcess polling loop]
    J --> K["GET /v1/crawl/{jobId}"]
    K -->|completed| L[Return pages + total]
    K -->|failed| M[Return error]
    K -->|timeout| N[Return timeout error]

    B --> O[transformResponse - always success:true]
    C --> P[transformResponse - always success:true]
    E --> Q[transformResponse - checks data.success]

Comments Outside Diff (1)

apps/sim/tools/crw/crawl.ts, line 623-634 (link)

transformResponse ignores API-level job creation failure

If the crawl POST returns HTTP 200 with { success: false, error: "…" }, transformResponse still returns success: true with jobId: undefined. postProcess then checks if (!result.success) (passes), and proceeds to poll ${baseUrl}/v1/crawl/undefined, which returns a 404 and surfaces a confusing "Failed to get crawl status: Not Found" error rather than the original creation error. Guard against this by checking data.success (or at least data.id) in transformResponse before the poll loop begins.

_{Reviews (1): Last reviewed commit: "feat: add fastCRW tool block" | Re-trigger Greptile}

greptile-apps · 2026-06-13T17:28:20Z

+    const result = data.data ?? data
+
+    return {
+      success: true,
+      output: {
+        markdown: result.markdown,
+        html: result.html,
+        metadata: result.metadata,
+      },
+    }
+  },
+
+  outputs: {


Scrape/search always report success: true regardless of API error body

Both scrape.ts and search.ts hardcode success: true in transformResponse. The map.ts counterpart correctly propagates data.success. When the fastCRW API returns HTTP 200 with { success: false, error: "…" } (e.g., invalid URL or auth error), the scrape and search tools will still emit success: true with undefined output fields, masking the failure from downstream blocks. map.ts shows the correct pattern: return success: data.success and reflect it in the output envelope.

greptile-apps · 2026-06-13T17:28:21Z

+  transformResponse: async (response: Response) => {
+    const data = await response.json()
+
+    return {
+      success: true,
+      output: {
+        data: data.data,
+      },
+    }
+  },


Search always reports success: true on API-level failures

Same issue as scrape.ts — transformResponse always returns success: true without checking data.success. The map.ts tool in this same PR correctly checks data.success. If the search API returns { success: false, error: "…" } with HTTP 200, downstream blocks see a successful result with data: undefined rather than a proper error.

scrape, search, and crawl transformResponse hardcoded success: true, masking HTTP 200 responses with { success: false, error }. They now reflect data.success and surface the error, matching map.ts. Crawl additionally fails fast when job creation has no id, preventing a poll loop against /v1/crawl/undefined. Adds error-path tests.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

Want reviews to match your repository better? Bugbot Learning can learn team-specific rules from PR activity. A team admin can enable Learning in the Cursor dashboard.

^{Reviewed by Cursor Bugbot for commit a18fa5d. Configure here.}

feat: add fastCRW tool block

2964aed

vercel Bot temporarily deployed to Preview June 13, 2026 17:21 Inactive

cursor Bot reviewed Jun 13, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 13, 2026

View reviewed changes

vercel Bot temporarily deployed to Preview June 13, 2026 19:40 Inactive

Merge branch 'main' into feat/add-fastcrw

a18fa5d

vercel Bot temporarily deployed to Preview June 19, 2026 21:31 Inactive

cursor Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread apps/sim/tools/crw/crawl.ts Outdated

fix(crw): remove unsupported include/exclude path params from crawl

bb6839d

vercel Bot temporarily deployed to Preview June 19, 2026 21:40 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add fastCRW tool block#5025

feat: add fastCRW tool block#5025
us wants to merge 4 commits into
simstudioai:mainfrom
us:feat/add-fastcrw

us commented Jun 13, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

cursor Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

cursor Bot Jun 13, 2026

Uh oh!

greptile-apps Bot commented Jun 13, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Jun 13, 2026

Uh oh!

greptile-apps Bot Jun 13, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

us commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Changes (additive only)

Config

Uh oh!

vercel Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

Uh oh!

Uh oh!

cursor Bot Jun 13, 2026

Choose a reason for hiding this comment

Crawl sends maxPages not limit

Uh oh!

greptile-apps Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

us commented Jun 13, 2026 •

edited

Loading

vercel Bot commented Jun 13, 2026 •

edited

Loading

cursor Bot commented Jun 13, 2026 •

edited

Loading

greptile-apps Bot commented Jun 13, 2026 •

edited

Loading